Introduction

Dataset: https://catalog.data.gov/dataset/meteorite-landings

This data set includes Meteorite Landing data from The Meteoritical Society and NASA, which includes 34,513 confirmed meteorite landings around the globe. The data was last updated November 2020.



Summary

In the data set, meteorites have multiple different fields: Name, ID, NameType, Class, Mass (g), Fall, Latitude, Longitude, and GeoLocation. Each meteorite has its own unique Name and ID. NameType has two factors: valid and relict, where relict meteorites, “were once meteorites but are now highly altered by weathering on Earth”. Class is the certain classification for different types of meteorites. Mass of meteorites is weighed in grams. Fall has two factors: found and fell, where fell meteorites were confirmed to have fallen but have not been found. Latitude and Longitude are both the coordinates for each meteorite.

lands = na.omit(landings) #removes NA lines
colnames(lands) = c("Name","ID","NameType","Class","Mass","Fall","Year","Latitude","Longitude", "GeoLocation")
lands$Class = as.factor(lands$Class)
lands$Fall = as.factor(lands$Fall)
#lands$Year = as.factor(lands$Year)
lands$NameType = as.factor(lands$NameType)
summary(lands)
##      Name                 ID          NameType         Class      
##  Length:38115       Min.   :    1   Relict:   21   L6     : 7519  
##  Class :character   1st Qu.:10832   Valid :38094   H5     : 6243  
##  Mode  :character   Median :21732                  H6     : 3898  
##                     Mean   :25343                  H4     : 3880  
##                     3rd Qu.:39888                  L5     : 3264  
##                     Max.   :57458                  LL5    : 2199  
##                                                    (Other):11112  
##       Mass             Fall            Year         Latitude     
##  Min.   :       0   Fell : 1065   Min.   : 860   Min.   :-87.37  
##  1st Qu.:       7   Found:37050   1st Qu.:1986   1st Qu.:-76.72  
##  Median :      29                 Median :1996   Median :-71.50  
##  Mean   :   15601                 Mean   :1990   Mean   :-39.60  
##  3rd Qu.:     187                 3rd Qu.:2002   3rd Qu.:  0.00  
##  Max.   :60000000                 Max.   :2101   Max.   : 81.17  
##                                                                  
##    Longitude       GeoLocation       
##  Min.   :-165.43   Length:38115      
##  1st Qu.:   0.00   Class :character  
##  Median :  35.67   Mode  :character  
##  Mean   :  61.31                     
##  3rd Qu.: 157.17                     
##  Max.   : 178.20                     
## 


Structure

Structure of each field within the data set.

str(lands)
## 'data.frame':    38115 obs. of  10 variables:
##  $ Name       : chr  "Aachen" "Aarhus" "Abee" "Acapulco" ...
##  $ ID         : int  1 2 6 10 370 379 390 392 398 417 ...
##  $ NameType   : Factor w/ 2 levels "Relict","Valid": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Class      : Factor w/ 422 levels "Acapulcoite",..: 307 182 78 1 313 78 328 175 313 223 ...
##  $ Mass       : num  21 720 107000 1914 780 ...
##  $ Fall       : Factor w/ 2 levels "Fell","Found": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Year       : int  1880 1951 1952 1976 1902 1919 1949 1814 1930 1920 ...
##  $ Latitude   : num  50.8 56.2 54.2 16.9 -33.2 ...
##  $ Longitude  : num  6.08 10.23 -113 -99.9 -64.95 ...
##  $ GeoLocation: chr  "(50.775, 6.08333)" "(56.18333, 10.23333)" "(54.21667, -113.0)" "(16.88333, -99.9)" ...
##  - attr(*, "na.action")= 'omit' Named int [1:7601] 13 38 39 77 94 148 173 205 209 263 ...
##   ..- attr(*, "names")= chr [1:7601] "13" "38" "39" "77" ...


Max Weight

Meteorite with the highest weight(g). Hoba is the current largest meteorite found, impacting around 80,000 years ago in Africa.

lands[which.max(lands$Mass),]
##       Name    ID NameType     Class     Mass  Fall Year  Latitude Longitude
## 16393 Hoba 11890    Valid Iron, IVB 60000000 Found 1920 -19.58333  17.91667
##                 GeoLocation
## 16393 (-19.58333, 17.91667)


Latitude vs. Longitude

World Map

In this chart I plotted Latitude vs. Longitude where the size of each dot is dependent on the mass, while color is determined by whether the meteorite was found or not.

qplot(data=lands, x = Longitude, y= Latitude, size=Mass, color=Fall)+ theme_solarized() +ggtitle("World-Wide Longitude vs. Latitude")


United States

This chart filters the Latitude and Longitude to show the United States. More meteorites were found in the midwest.

US = lands[lands$Longitude <= -50,]
US = US[US$Latitude >=0,]
qplot(data=US, x = Longitude, y= Latitude, size=Mass, color=Fall)+ theme_solarized()+ ggtitle("U.S.A. Longitude vs. Latitude")


Yearly Visualizations

Year vs. Mass

Meteorites discovered by Year vs. Mass where color is the class (legend hidden). Most meteorites were uncovered within the last 20 years or so.

qplot(data = lands, x = Mass, y = Year, geom = "point",color = Class,show.legend = FALSE )+xlim(0,4000)+ylim(1600,2050)+ggtitle("Year vs. Mass")+ theme_solarized()


Histogram

Histogram chart plotting Years found from 1850-1950, where color is whether or not if meteorite was found.

ggplot(lands, aes(Year, fill = Fall)) +
  geom_histogram(bins = 30,col=I("black")) + xlim(1850,1950)+ ggtitle("1850-1950 Histogram")+ theme_solarized() + ylab("Frequency")


NameType Frequency

Frequency of meteorites with the NameType of Relict, using freqpoly plot.

pops = lands[lands$NameType == "Relict",]
ggplot(pops, aes(x = Year, color = Class)) +geom_freqpoly(binwidth=2, size = 2) + xlim(1970, 2015) + theme_solarized()+ggtitle("NameType: Relict") + ylab("Frequency")

Linear Regression

Mass vs Year of Class L6

This linear regression chart shows class L6 being filtered. Mass is limited from 0-1000g, while Year is limited from 1990-2000. You can see that most of the metorites are smaller with more being found in recent years.

L6Class = lands[lands$Class == "L6",]
test  = L6Class[L6Class$Year >= 1990 & L6Class$Year <= 2000,]
test = test[test$Mass <= 1000,]
x = test$Year
y = test$Mass
lr <- lm(y~x)
plot(x,y, main = "Linear Regression: Mass vs Year of L6", xlab = "Year: 1990-2000", ylab = "Mass: 0-1000g")
points( x, lr$coefficients[1] + lr$coefficients[2] * x, type="l", col=4)


Facet

Class Comparison

This facet chart showcases the Mass vs. Year of the 6 highest count classes. Year is limited from 1950-2000, and Mass from 0-2000g. Each class is filtered into its own dataframe then combined. You can see that class LL5 meteorites are found to be smaller than the other classes.

H5Class = lands[lands$Class == "H5",]
L5Class = lands[lands$Class == "L5",]
H6Class = lands[lands$Class == "H6",]
H4Class = lands[lands$Class == "H4",]
LL5Class = lands[lands$Class == "LL5",]
Combo = rbind(H5Class, L6Class, L5Class, H6Class, H4Class, LL5Class)
#summary(test)
ggplot(data=Combo, aes(x=Year, y=Mass, color=Class)) + xlim(1950, 2000) + ylim(0,2000) + geom_point(size=2) + facet_grid(Class~.) + theme_solarized() + ggtitle("Class Comparison")


Clustering

Latitude vs. Longitude

K-Means: from this graph I concluded that 3 clusters would be ideal for the latitude and longitude clustering chart.

mat <- cbind( lands$Longitude, lands$Latitude)
mat = na.omit(mat)
clust = lands
wss <- rep(0,15)
for (k in 1:15)
  wss[k] <- sum( kmeans(mat,centers=k, nstart=50)$withinss)
plot(wss, type="b", main = "K-Means", xlab = "Index", ylab = "WSS" ) 


From this point chart, we can see that the clusters are separated by larger land masses.

km = kmeans(mat,centers=3)$cluster
clust$cl <- factor( km)  
qplot(data=clust, x=Longitude,y=Latitude, color=cl)+ theme_solarized() + ggtitle("Clustering: Latitude vs. Longitude")


3D ScatterPlot

Instead of using total data, these 3D ScatterPlots use Class H5 where the years are over 1900. Most meteorites are found in recent years.

smol = lands[lands$Year >= "1900",]
smol = smol[smol$Class == "H5",]

scatter3D(smol$Longitude,smol$Latitude,smol$Year, 
          main="Latitude vs. Longitude vs. Year",
          xlab = "Longitude",
          ylab = "Latitude",
          zlab = "Year")

mats <- cbind(smol$Longitude,smol$Latitude,smol$Year, col=NULL)

km = kmeans(mats,centers=3)$cluster
smol$cl <-  km  

scatter3D(smol$Longitude,smol$Latitude,smol$Year, colvar=smol$cl,iris[,1:3],
          main="Latitude vs. Longitude vs. Year",
          xlab = "Longitude",
          ylab = "Latitude",
          zlab = "Year")